CN115114597A

CN115114597A - Tracing watermark embedding and extracting method based on character information

Info

Publication number: CN115114597A
Application number: CN202210693843.9A
Authority: CN
Inventors: 陈明志; 梁镇; 施友安; 翁才杰; 姚宏玮; 许春耀; 张瑞
Original assignee: Beijing Beika Technology Co ltd
Current assignee: Beijing Beika Technology Co ltd
Priority date: 2022-06-19
Filing date: 2022-06-19
Publication date: 2022-09-27

Abstract

The invention discloses a traceable watermark embedding and extracting method based on text information, wherein a steganographic algorithm is adopted during embedding, and a target font is replaced by a fusion font to obtain a carrier containing secret; during extraction, watermark information is extracted according to the font type containing the secret carrier data and the writing stroke number of the font type. The method uses the writing stroke number and the font type of the character as a carrier, replaces the target font with the fusion font which is highly similar to the target font to realize the embedding of the watermark, can ensure that the traceable watermark is invisible, can also meet the requirements of concealment and robustness, can extract the watermark information through a divulgence medium after sensitive information is photographed, intercepted and recorded, and can trace and position the divulgence source.

Description

Tracing watermark embedding and extracting method based on character information

Technical Field

The invention belongs to the technical field of information security, and particularly relates to a traceability watermark embedding and extracting method based on text information.

Background

The tracing watermark can realize the divulgence tracing of sensitive information, common visible watermarks are easy to erase and tamper, and conventional invisible watermarks have poor robustness and concealment, cannot meet the requirements of practical application and need to be improved.

Disclosure of Invention

The invention aims to provide a traceability watermark embedding and extracting method based on character information, which can ensure that a traceability watermark is invisible and can meet the requirements of concealment and robustness.

In order to achieve the above purpose, the solution of the invention is:

a tracing watermark embedding method based on character information adopts steganography algorithm to replace target font with fusion font to obtain a carrier containing secret.

The steganographic algorithm may employ a (7,4) hamming code-based information hiding method, an LSB algorithm, a matrix coding-based information hiding method, or an STC coding-based information hiding method.

The method specifically comprises the following steps:

step A1, selecting 1 font as the target font, and collecting n-style fused fonts as candidate replacement fonts of the target font, wherein n is 2 ^α -1, α is the codeword width of the information carried by each word, and its value is an integer no less than 1;

step A2, assume watermark information as L ₄ Binary data M of bits, using

Representing a font, x ₀ For the target font, x _j Represents the jth fusion font, j ═ 1,2 ^α -1；

Step A3, assume that there is L in the carrier data ₀ A word, if

Continuing to execute the subsequent operation, otherwise returning the prompt message of insufficient capacity;

step A4, extracting pre-L ₁ The number of strokes written in each word is recorded as SN _i Wherein i is 0,1 ₁ -1，L ₁ ＝L ₀ -L ₀ % 7,% represents the remainder operation;

step a5, calculating the information represented by each word in the carrier, denoted as R,

the method comprises the following specific steps:

r _i ＝SN _i ％(2 ^α ),i＝0,1,...,L ₁ -1

step a6, converting each element in R into an alpha-bit binary sequence, denoted as C,

L ₂ ＝α*L ₁ ，c _j the values of (A) are as follows:

wherein j is 0,1 ₂ -1，

β＝α-j％α，

Represents rounding down;

step A7, divide C into L ₃ Sub-blocks of 7 bits of data each, represented by a row vector, and denoted as D _k ，k＝0,1,...,L ₃ -1，

Step A8, mixingAmplification of watermark information M to 3L ₃ The bit data as the information to be embedded is marked as M ', M' is the number of

Taking the front 3L of the M after splicing ₃ The result of the individual data is,

represents rounding up;

step A9, divide M' into L ₃ Each subblock is 3-bit data, and each subblock is represented by a row vector and is denoted as m _k ，k＝0,1,...,L ₃ -1；

Step A10, calculate D _k In modifying the bit position, if

The carrier data does not need to be modified, when D _k '＝D _k Otherwise d will be _k Indexing the positions appearing in the check matrix H by D _k The result of negating the element at the corresponding position in the sequence is recorded as D _k ' repeating the above operation by successively increasing the value of k until the watermark information is completely embedded in the carrier, d _k The specific calculation of (a) is as follows:

wherein

To replace the multiplication of the matrix and vector after the addition operation with a modulo-2 sum operation,

for XOR operation, m _k For the kth group to be embedded with information, m _k ＝[z _k0 ,z _k1 ,z _k2 ]，z _i The check matrix is formed by the following specific forms:

step A11, with D _k ' replacement of corresponding D in C _k Obtaining secret-containing data C';

step A12, divide C' into L ₂ Each sub-block comprises alpha bit data, and the data of each sub-block is converted into corresponding decimal number which is recorded as r _i ', by r _i ' Replacing the corresponding R in R _i Obtaining R';

step A13, replacing fonts according to R' and R if R _i '＝r _i Then the original font is kept unchanged, if r is _i '≠r _i Then the font x is used ₀ Substitution into fonts

Wherein λ _i ＝(r _i '-r _i +2 ^α )％2 ^α Thereby obtaining the secrecy-containing vector.

A source tracing watermark extraction method based on character information extracts watermark information according to font types containing secret carrier data and writing stroke numbers of the font types.

The method specifically comprises the following steps:

step B1, assuming that the secrecy-containing vector contains L ₀ ' word, extract preceding L ₁ ' number of strokes written for each word in a word and type of font thereof are recorded as SN _i ' and y _i Wherein y is _i ∈X，

For the same set of fonts consisting of the target font and the merged font as the embedding process, i is 0,1 ₁ ′-1，L ₁ ′＝L ₀ ′-L ₀ '% 7,% indicates the remainder operation;

step B2, calculating information expressed by the number of writing strokes of each word in the data containing secret carrier, recording as R',

the method comprises the following specific steps:

r _i '＝SN _i '％(2 ^α ),i＝0,1,...,L ₁ '-1

step B3, according to the information and font type represented by the number of writing strokes, calculating the information R carried by the secret-containing carrier,

r _i ＝(r _i '+λ _i )％(2 ^α )，i＝0,1,...,L ₁ '-1，λ _i the values of (a) are as follows:

step B4, converting each element in R into an alpha-bit binary sequence, denoted as C',

L ₂ ′＝α*L ₁ ′，c _j ' take the following values:

wherein j is 0,1 ₂ ′-1，

β＝α-j％α，

Represents rounding down;

step B5, divide C' into L ₃ ' sub-blocks of 7 bits of data each, represented by a row vector, and denoted as D _k ′，k＝0,1,...,L ₃ ′-1，

Step B6, calculating D _k ' the watermark information is represented by a row vectorAnd denote it as m _k ', its set is noted as

m _k The specific calculation method of' is as follows:

wherein m is _k ＝[z _k0 ,z _k1 ,z _k2 ]，z _i ∈{0,1}，i＝0,1,2；

Step B7, adding the first 3L of M ₃ ′-3L ₃ ′％L ₄ Bit data partitioning into L ₅ Individual blocks of each L ₄ The bit data is expressed by a row vector and is marked as

ξ _hi ∈M′，h＝0,1,...,L ₅ -1，

Step B8, using wk _h Constructing a matrix M ', counting the occurrence frequency of each element value in each column vector of the matrix M', and marking the element with the highest occurrence frequency as xi _g ' the row vector formed by each row of elements with the highest occurrence frequency is the proposed watermark information and is marked as WK,

the specific form of M' is as follows:

when the same watermark is circularly embedded, the font type of the previous word at each repeated watermark information embedding position is modified

Having the same characteristics as the fonts in the merged font library X but with the same characteristics

After the scheme is adopted, the invention has the following beneficial effects:

(1) the invention combines the fusion font and the steganography based on the (7,4) Hamming code, and provides a new invisible traceability water technology, the watermark generated by the method is invisible to human eyes, the watermark information can be extracted through a divulgence medium after sensitive information is photographed, screen shot and recorded, and the divulgence source is tracked and positioned;

(2) the writing stroke number of the character is used as a carrier for embedding the source tracing watermark, the carrier information represented by different characters with the same font type is different, compared with a scheme for directly replacing the font (the carrier information represented by different characters with the same font type in the scheme is the same), the carrier data is more diverse, on the basis, a steganography based on a (7,4) Hamming code is used, the embedding efficiency is obviously improved, and the modification amount of the carrier during embedding information is reduced;

(3) the invention designs a unique font type replacement mode, the secret carrier containing the same information can correspond to different font types, the statistical significance of the font types in the data containing the secret carrier after the watermark is embedded is damaged to a certain degree, and the safety of the watermark information is enhanced.

Drawings

Fig. 1 is a flow chart of watermark embedding in the present invention;

fig. 2 is a flow chart of watermark extraction in the present invention;

fig. 3 is a schematic diagram of information correspondence represented by a character and a font type thereof when α is 2.

Detailed Description

The technical solution and the advantages of the present invention will be described in detail with reference to the accompanying drawings.

The invention provides a traceability watermarking method based on text information, the text information is writing stroke number and font type of the text, the method mainly comprises embedding and extracting watermark, and mainly replaces target font by fusion font (Tian Y.zi2zi: Master chip mapping with conditional adaptive adaptation network, 2017J. corrected Jun,2017,3:2017. He K, Zhang X, Ren S, et al. deep residual deletion leaving for imaging and C. Proceedings of the IEEE communication on video and mapping, 2016: Kun 770 778.) to realize traceability watermarking embedding, and simultaneously adopts hidden technology (CAO and Yikan, Youhai, Zhao, etc.) based on (7,4) Chinese plain code to hide the target font, and the hidden technology (CAO and Yihao) is applied to the national security information research, and the like, and the hidden electronic communication technology is based on the Beijing society of China Union, Zhao, Beijing communication and the Beijing communication society of China general information, and the hidden technology is based on the Beijing communication society of China plain code of China, the hidden electronic society of China, the hidden technology of China Union, the hidden technology of China, the hidden electronic society of China, China, 2015.) to improve watermark embedding efficiency; and extracting watermark information according to the font type containing the secret carrier data and the writing stroke number of the font type.

As shown in fig. 1 and fig. 2, the specific steps of watermark embedding and extracting are as follows:

1. watermark embedding

1) Selecting 1 font as a target font, collecting n styles of fusion fonts as candidate replacement fonts of the target font, wherein the fusion fonts need to have certain difference with the target font but are highly similar to each other, human eyes can not distinguish the difference between the fusion fonts and the target font, but the specific types of the target font and the candidate replacement fonts can be identified through a machine learning algorithm, and n is 2 ^α 1, α is the code word width of the information carried by each word, and the value thereof is an integer no less than 1, and the specific value can be set according to the actual requirement;

2) suppose watermark information is L ₄ Binary data M of bits, using

Representing a font, x ₀ Is the target font, i.e. the font of the original text, x _j Represents the jth fusion font, j ═ 1,2 ^α -1；

3) Suppose there is L in the carrier data ₀ A word, if

4) l before extraction ₁ The number of strokes written in each word is recorded as SN _i Wherein i is 0,1 ₁ -1，L ₁ ＝L ₀ -L ₀ % 7,% represents the remainder operation;

5) the information represented by each word in the calculation carrier, denoted R,

the method comprises the following specific steps:

r _i ＝SN _i ％(2 ^α ),i＝0,1,...,L ₁ -1

6) each element in R is converted to an alpha-bit binary sequence, denoted C,

L ₂ ＝α*L ₁ ，c _j the values of (A) are as follows:

wherein j is 0,1 ₂ -1，

β＝α-j％α，

Indicating a rounding down.

7) Dividing C into L ₃ Sub-blocks of 7 bits of data each, represented by a row vector, and denoted as D _k ，k＝0,1,...,L ₃ -1，

8) Amplifying watermark information M into 3L ₃ The bit data as the information to be embedded is marked as M ', M' is the data of the embedded information

represents rounding up;

9) dividing M' into L ₃ Each subblock is 3-bit data, and each subblock is represented by a row vector and is denoted as m _k ，k＝0,1,...,L ₃ -1；

10) Calculating D _k In modifying the bit position, if

wherein

for XOR operation, m _k For the kth group of information to be embedded, m _k ＝[z _k0 ,z _k1 ,z _k2 ]，z _i The check matrix is formed by the following specific forms:

11) by D _k ' replacement of corresponding D in C _k Obtaining secret-containing data C';

12) divide C intoIs L ₂ Each sub-block comprises alpha bit data, and the data of each sub-block is converted into corresponding decimal number which is recorded as r _i ', by r _i ' Replacing the corresponding R in R _i Obtaining R';

13) performing font replacement according to R' and R if R _i '＝r _i Then the original font is kept unchanged, if r is _i '≠r _i Then the font x is written ₀ Substitution into fonts

Wherein λ is _i ＝(r _i '-r _i +2 ^α )％2 ^α 。

2. Watermark extraction

1) Extracting text information from secret-containing carrier, assuming secret-containing carrier contains L ₀ ' word, extract preceding L ₁ ' number of strokes written for each word in a word and type of font thereof are recorded as SN _i ' and y _i Wherein y is _i ∈X，

For the same set of fonts consisting of the target font and the fusion font as the embedding process, i ═ 0,1 ₁ ′-1，L ₁ ′＝L ₀ ′-L ₀ '% 7,% indicates the remainder operation;

2) calculating information expressed by the number of writing strokes of each word in the data containing the secret carrier, recording as R',

the method comprises the following specific steps:

r _i '＝SN _i '％(2 ^α ),i＝0,1,...,L ₁ '-1

3) according to the information and font type represented by the number of writing strokes, the information R carried by the secret-containing carrier is calculated,

4) each element in R is converted to an alpha-bit binary sequence, denoted C',

L ₂ ′＝α*L ₁ ′，c _j ' take the following values:

wherein j is 0,1 ₂ ′-1，

β＝α-j％α，

Indicating a rounding down.

5) Division of C' into L ₃ ' sub-blocks of 7 bits of data each, represented by a row vector, and denoted as D _k ′，k＝0,1,...,L ₃ ′-1，

6) Calculate D _k ' the watermark information is represented by a row vector and is denoted as m _k ', its set is noted as

m _k The specific calculation method of' is as follows:

wherein m is _k ＝[z _k0 ,z _k1 ,z _k2 ]，z _i ∈{0,1}，i＝0,1,2；

7) The first 3L of M ₃ ′-3L ₃ ′％L ₄ Bit data division into L ₅ Individual blocks of each L ₄ The bit data is expressed by a row vector and is marked as

ξ _hi ∈M′，h＝0,1,...,L ₅ -1，

8) By wk _h Constructing a matrix M ', counting the occurrence frequency of each element value in each column vector of the matrix M', and marking the element with the highest occurrence frequency as xi _g ' the row vector formed by each row of elements with the highest occurrence frequency is the proposed watermark information and is marked as WK,

the specific form of M "is as follows:

the font type of the previous word in each repeated watermark information embedding position can be modified when the same watermark is embedded circularly

Fig. 3 shows a schematic diagram of information correspondence represented by a character and its font type when α is 2.

It should be noted that, the steganography based on the (7,4) hamming code used in the embedding and extracting processes of the watermark information in the method provided by the present invention can be replaced by other suitable steganography algorithms according to the needs, such as LSB algorithm, information hiding method based on matrix coding, information hiding method based on STC coding, and the like.

In summary, the traceability watermark embedding and extracting method based on the character information is characterized in that the writing stroke number and the font type of the character are used as carriers, a unique font replacement rule is designed, the target font is replaced by the fusion font which is highly similar to the target font to realize the embedding of the watermark, the method can ensure that the traceability watermark is invisible, can meet the requirements of concealment and robustness, and can extract the watermark information through a leakage medium after sensitive information is photographed, intercepted and recorded, thereby tracking and positioning a leakage source.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims

1. A tracing watermark embedding method based on text information is characterized in that: and replacing the target font with the fusion font by adopting a steganography algorithm to obtain the secret-containing carrier.

2. The method of claim 1, wherein the method comprises: the steganographic algorithm may employ a (7,4) hamming code-based information hiding method, an LSB algorithm, a matrix coding-based information hiding method, or an STC coding-based information hiding method.

3. The method for embedding a traceable watermark based on textual information according to claim 1, comprising the steps of:

step A1, selecting 1 font as the target font, and collecting n-style fused fonts as candidate replacement fonts of the target font, wherein n is 2 ^α -1, α being the codeword width of the information carried by each word, the value of which is an integer no less than 1;

step A2, assume watermark information as L ₄ Binary data M of bits, using

Representing a font, x ₀ For the target font, x _j Denotes the jth font, j ═ 1,2,., 2 ^α -1；

Step A3, assume that there is L in the carrier data ₀ A word if

the method comprises the following specific steps:

r _i ＝SN _i ％(2 ^α ),i＝0,1,...,L ₁ -1

L ₂ ＝α*L ₁ ，c _j the values of (A) are as follows:

wherein j is 0,1 ₂ -1，

β＝α-j％α，

Represents rounding down;

Step A8, amplifying watermark information M into 3L ₃ The bit data as the information to be embedded is marked as M ', M' is the number of

represents rounding up;

Step A10, calculating D _k In modifying the bit position, if

wherein

for XOR operation, m _k For the kth group to be embedded with information, m _k ＝[z _k0 ,z _k1 ,z _k2 ]，z _i Where, H is a check matrix, i is 0,1,2, and the specific form is as follows:

step A13, replacing fonts according to R' and R, if R _i '＝r _i Then the original font is kept unchanged, if r is _i '≠r _i Then the font x is used ₀ Substitution into fonts

4. A tracing watermark extraction method based on text information is characterized in that: and extracting watermark information according to the font type containing the secret carrier data and the writing stroke number of the font type.

5. The method for extracting a source-tracing watermark based on text information according to claim 4, characterized by comprising the following steps:

step B1, assuming that the secrecy-containing vector contains L ₀ ' word, extract preceding L ₁ ' wordThe number of strokes and the type of font of each character are recorded as SN _i ' and y _i Wherein y is _i ∈X，

the method comprises the following specific steps:

r _i '＝SN _i '％(2 ^α ),i＝0,1,...,L ₁ '-1

L ₂ ′＝α*L ₁ ′，c _j ' take the following values:

wherein j is 0,1 ₂ ′-1，

β＝α-j％α，

Represents rounding down;

Step B6, calculating D _k ' the watermark information is represented by a row vector and is denoted as m _k ', its set is noted as

m _k The specific calculation method of' is as follows:

wherein m is _k ＝[z _k0 ,z _k1 ,z _k2 ]，z _i ∈{0,1}，i＝0,1,2；

Step B7, top 3L of M ₃ ′-3L ₃ ′％L ₄ Bit data partitioning into L ₅ Individual blocks of each L ₄ The bit data is expressed by a row vector and is denoted as

ξ _hi ∈M′，h＝0,1,...,L ₅ -1，

Step B8, using wk _h Constructing a matrix M ', counting the occurrence frequency of each element value in each column vector of the matrix M', and recording the element with the highest occurrence frequency as xi _g ', the row vector composed of the elements with the highest frequency of occurrence in each row is the watermark information which is proposed and is marked as WK, and the WK is xi ₀ ′,ξ ₁ ′,...,ξ _L4-1 ′]The specific form of M' is as follows:

when the same watermark is circularly embedded, the font type of the previous word at the embedding position of each repeated watermark information is modified